Web Spam: a Survey with Vision for the Archivist

نویسندگان

  • András A. Benczúr
  • Dávid Siklósi
  • Jácint Szabó
  • István Bíró
  • Zsolt Fekete
  • Miklós Kurucz
  • Attila Pereszlényi
  • Simon Rácz
  • Adrienn Szabó
چکیده

While Web archive quality is endangered by Web spam, a side effect of the high commercial value of top-ranked search-engine results, so far Web spam filtering technologies are rarely used byWeb archivists. In this paper we make the first attempt to disseminate existing methodology and envision a solution for Web archives to share knowledge and unite efforts in Web spam hunting. We survey the state of the art inWeb spam filtering illustrated by the recent Web spam challenge data sets and techniques and describe the filtering solution for archives envisioned in the LiWA—Living Web Archives project.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on Web Spam and Spam 2.0

In current scenario web is huge, highly distributive, open in nature and changing rapidly. The open nature of web is the main reason for rapid growth but it has imposed a challenge to Information Retrieval. The one of the biggest challenge is spam. We focus here to have a study on different forms of the web spam and its new variant called spam 2.0, existing detection methods proposed by differe...

متن کامل

Approaches for Web Spam Detection

Spam is a major threat to web security. The web of trust is being abused by the spammers through their ever evolving new tactics for their personal gains. In fact, there is a long chain of spammers who are running huge business campaigns under the web. Spam causes underutilization of search engine resources and creates dissatisfaction among web community. Web Security being a prime challenge fo...

متن کامل

ارائه روشی مناسب برای دسته بندی نامه های الکترونیکی تبلیغاتی بر مبنای پروفایل کاربران

In general, Spam is related to satisfy or not satisfy the client and isn’t related to the content of the client’s email. According to this definition, problems arise in the field of marketing and advertising for example, it is possible that some of the advertising emails become spam for some users, and not spam for others. To deal with this problem, many researchers design an anti-s...

متن کامل

Spam 2.0 State of the Art

Spam 2.0 is defined as the propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications. A fake eye-catching profile in social networking websites, a promotional review, a response to a thread in online forums with unsolicited content, or a manipulated Wiki page are examples of Spam 2.0. In this paper, the authors provide a comprehensive survey of the state-...

متن کامل

A Survey on Web Spam Detection Methods: Taxonomy

Web spam refers to some techniques, which try to manipulate search engine ranking algorithms in order to raise web page position in search engine results. In the best case, spammers encourage viewers to visit their sites, and provide undeserved advertisement gains to the page owner. In the worst case, they use malicious contents in their pages and try to install malware on the victim’s machine....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008